Search CORE

211 research outputs found

Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Author: Beck Daniel
Cohn Trevor
Specia Lucia
Publication venue
Publication date: 01/01/2016
Field of study

Machine Translation Quality Estimation is a notoriously difficult task, which lessens its usefulness in real-world translation environments. Such scenarios can be improved if quality predictions are accompanied by a measure of uncertainty. However, models in this task are traditionally evaluated only in terms of point estimate metrics, which do not take prediction uncertainty into account. We investigate probabilistic methods for Quality Estimation that can provide well-calibrated uncertainty estimates and evaluate them in terms of their full posterior predictive distributions. We also show how this posterior information can be useful in an asymmetric risk scenario, which aims to capture typical situations in translation workflows.Comment: Proceedings of CoNLL 201

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Bridging the gap between folksonomies and the semantic web: an experience report

Author: Angeletou Sofia
Motta Enrico
Sabou Marta
Specia Lucia
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. While folksonomies allow tagging of similar resources with a variety of tags, their content retrieval mechanisms are severely hampered by being agnostic to the relations that exist between these tags. To overcome this limitation, several methods have been proposed to find groups of implicitly inter-related tags. We believe that content retrieval can be further improved by making the relations between tags explicit. In this paper we propose the semantic enrichment of folksonomy tags with explicit relations by harvesting the Semantic Web, i.e., dynamically selecting and combining relevant bits of knowledge from online ontologies. Our experimental results show that, while semantic enrichment needs to be aware of the particular characteristics of folksonomies and the Semantic Web, it is beneficial for both.

CiteSeerX

Open Research Online (The Open University)

Complex Word Identification: Challenges in Data Annotation and System Performance

Author: Malmasi Shervin
Paetzold Gustavo
Specia Lucia
Zampieri Marcos
Publication venue
Publication date: 13/10/2017
Field of study

This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task. We use ensemble classifiers to investigate how well computational methods can discriminate between complex and non-complex words. Furthermore, we analyze the classification performance to understand what makes lexical complexity challenging. Our findings show that most systems performed poorly on the SemEval CWI dataset, and one of the reasons for that is the way in which human annotation was performed.Comment: Proceedings of the 4th Workshop on NLP Techniques for Educational Applications (NLPTEA 2017

arXiv.org e-Print Archive

ZENODO

A review of translation tools from a post-editing perspective

Author: Nunes Vieira Lucas
Specia Lucia
Publication venue
Publication date: 14/10/2011
Field of study

Explore Bristol Research

Multi-modal Context Modelling for Machine Translation

Author: Specia Lucia
Publication venue: European Association for Machine Translation
Publication date: 01/01/2018
Field of study

MultiMT is an European Research Council Starting Grant whose aim is to devise data, methods and algorithms to exploit multi-modal information (images, audio, metadata) for context modelling in machine translation and other cross-lingual tasks. The project draws upon different research fields including natural language processing, computer vision, speech processing and machine learning

Repositorio Institucional de la Universidad de Alicante

Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words

Author: Paetzold Gustavo Henrique
Specia Lucia
Publication venue
Publication date
Field of study

Conference paper: Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Word

ZENODO

Revisiting Contextual Toxicity Detection in Conversations

Author: Anuchitanukul Atijit
Ive Julia
Specia Lucia
Publication venue
Publication date: 23/04/2022
Field of study

Understanding toxicity in user conversations is undoubtedly an important problem. Addressing "covert" or implicit cases of toxicity is particularly hard and requires context. Very few previous studies have analysed the influence of conversational context in human perception or in automated detection models. We dive deeper into both these directions. We start by analysing existing contextual datasets and come to the conclusion that toxicity labelling by humans is in general influenced by the conversational structure, polarity and topic of the context. We then propose to bring these findings into computational detection models by introducing and evaluating (a) neural architectures for contextual toxicity detection that are aware of the conversational structure, and (b) data augmentation strategies that can help model contextual toxicity detection. Our results have shown the encouraging potential of neural architectures that are aware of the conversation structure. We have also demonstrated that such models can benefit from synthetic data, especially in the social media domain

arXiv.org e-Print Archive